LAF: a new XML encoding and indexing strategy for keyword-based XML search
نویسندگان
چکیده
As a large number of corpuses are represented, stored and published in XML format, how to find useful information from XML databases has become an increasingly important issue. Keyword search enables web users to easily access XML data without the need to learn a structured query language or to study complex data schemas. Most existing indexing strategies for XML keyword search are based upon Dewey encoding. In this paper, we proposed a new encoding method called Level Order and Father (LAF) for XML documents. With LAF encoding, we devised a new index structure, called two-layer LAF inverted index, which can greatly decrease the space complexity compared with Dewey encoding-based inverted index. Furthermore, with two-layer LAF inverted index, we proposed a new keyword query algorithm called Algorithm based on Binary Search (ABS) that can quickly find all Smallest Lowest Common Ancestor. We experimentally evaluate two-layer LAF inverted index and ABS algorithm on four real XML data sets selected from Wikipedia. The experimental results prove the advantages of our index method and querying algorithm. The space consumed by two-layer LAF index is less than half of that consumed by Dewey inverted index. Moreover, ABS is about one to two orders of magnitude faster than the classic Stack algorithm. Concurrency and Computation: Practice and Experience, 2012.© 2012 Wiley Periodicals, Inc. Received 19 January 2011; Revised 27 June 2012; Accepted 27 June 2012
منابع مشابه
A SearchAlgorithm Oriented to XMLKeywords
It is one of the core issues of XML keywords search algorithm about putting the hierarchical structure information of XML data into the index and making it support the efficient keyword search algorithm. This paper proposed a new a retrieval algorithm which was based on LAF coding; HBA, a bottom-up XML keyword search algorithm, and it can support a variety of search semantic models effectively....
متن کاملXML based Keyword Search
The success of information retrieval style keyword search on the web leads to the emergence of XML based keyword search. The text database and XML database differences leads to three new challenges: 1) The users search intention is to be identified, i. e. , the XML node types that user wants to search for and search via is identified. 2) The similarities in tag name, tag value and the structure...
متن کاملSchema-Independence in XML Keyword Search
XML keyword search has attracted a lot of interests with typical search based on lowest common ancestor (LCA). However, in this paper, we show that meaningful answers can be found beyond LCA and should be independent from schema designs of the same data content. Therefore, we propose a new semantics, called CR (Common Relative), which not only can find more answers beyond LCA, but the returned ...
متن کاملSAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents
Keyword search in XML documents has recently gained a lot of research attention. Given a keyword query, existing approaches first compute the lowest common ancestors (LCAs) or their variants of XML elements that contain the input keywords, and then identify the subtrees rooted at the LCAs as the answer. In this the paper we study how to use the rich structural relationships embedded in XML docu...
متن کاملeXist: An Open Source Native XML Database
With the advent of native and XML enabled database systems, techniques for efficiently storing, indexing and querying large collections of XML documents have become an important research topic. This paper presents the storage, indexing and query processing architecture of eXist, an Open Source native XML database system. eXist is tightly integrated with existing tools and covers most of the nat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 25 شماره
صفحات -
تاریخ انتشار 2013